Near-Optimal Bounds for Binary Embeddings of Arbitrary Sets
نویسندگان
چکیده
We study embedding a subset K of the unit sphere to the Hamming cube {−1,+1}m . We characterize the tradeoff between distortion and sample complexity m in terms of the Gaussian width ω(K) of the set. For subspaces and several structured-sparse sets we show that Gaussian maps provide the optimal tradeoff m ∼ δω(K), in particular for δ distortion one needs m ≈ δd where d is the subspace dimension. For general sets, we provide sharp characterizations which reduces to m ≈ δω(K) after simplification. We provide improved results for local embedding of points that are in close proximity of each other which is related to locality sensitive hashing. We also discuss faster binary embedding where one takes advantage of an initial sketching procedure based on Fast Johnson-Lindenstauss Transform. Finally, we list several numerical observations and discuss open problems.
منابع مشابه
Near-lossless Binarization of Word Embeddings
Is it possible to learn binary word embeddings of arbitrary size from their real-value counterparts with (almost) no loss in task performance? If so, inferences performed in downstream NLP applications would benefit a massive speed-up brought by binary representations. In this paper, we derive an autoencoder architecture to learn semanticpreserving binary embeddings from existing realvalue ones...
متن کاملBinary Embedding: Fundamental Limits and Fast Algorithm
Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in S, our goal is to encode each point using mdimensional binary strings such that we can reconstruct their geodesic distance up to δ uniform distortion. Existing bina...
متن کاملDimension Reduction Algorithms for Near - Optimal Low - Dimensional Embeddings and Compressive Sensing
In this thesis, we establish theoretical guarantees for several dimension reduction algorithms developed for applications in compressive sensing and signal processing. In each instance, the input is a point or set of points in d-dimensional Euclidean space, and the goal is to find a linear function from Rd into Rk , where k << d, such that the resulting embedding of the input pointset into k-di...
متن کاملUniversal Approximation of Interval-valued Fuzzy Systems Based on Interval-valued Implications
It is firstly proved that the multi-input-single-output (MISO) fuzzy systems based on interval-valued $R$- and $S$-implications can approximate any continuous function defined on a compact set to arbitrary accuracy. A formula to compute the lower upper bounds on the number of interval-valued fuzzy sets needed to achieve a pre-specified approximation accuracy for an arbitrary multivariate con...
متن کاملNew Bounds and Optimal Binary Signature Sets–Part I: Periodic Total Squared Correlation
We derive new bounds on the periodic (cyclic) total squared correlation (PTSC) of binary antipodal signature sets for any number of signatures K and any signature length L. Optimal designs that achieve the new bounds are then developed for several (K,L) cases. As an example, it is seen that complete (K = L + 2) Gold sets are PTSC optimal, but not, necessarily, Gold subsets of K < L + 2 signatur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1512.04433 شماره
صفحات -
تاریخ انتشار 2015